We present a stochastic finite-state model for segmenting Chinese text intodictionary entries and productively derived words, and providing pronunciationsfor these words; the method incorporates a class-based model in its treatmentof personal names. We also evaluate the system's performance, taking intoaccount the fact that people often do not agree on a single segmentation.
展开▼